Skip to content

Refactor usage polling to paged APIs with legacy fallback#138

Open
zly2006 wants to merge 7 commits into
seakee:mainfrom
zly2006:optimize-usage-conditional-requests
Open

Refactor usage polling to paged APIs with legacy fallback#138
zly2006 wants to merge 7 commits into
seakee:mainfrom
zly2006:optimize-usage-conditional-requests

Conversation

@zly2006
Copy link
Copy Markdown

@zly2006 zly2006 commented May 24, 2026

中文

变更内容

  • 将监控页的高频用量轮询从完整 /v0/management/usage 响应拆分为更小的分页/聚合接口:summaryaccountsapi-keysrealtimemodels
  • /v0/management/usage/summary 现在只返回全局聚合数据,包括总请求数、成功/失败、tokens 和 latency 聚合,不再携带详细 breakdown。
  • 新增 model breakdown 分页接口,并让预估花费基于分页拉取后的 model 聚合数据计算,避免依赖 summary detail。
  • 保留旧后端兼容能力:分页接口不可用时仍回退到旧版 /v0/management/usage 数据结构。
  • 手动刷新和 5 秒自动刷新会刷新所有新版 usage 接口,不再轮询完整 /v0/management/usage 大响应。
  • 后端为分页接口增加 page_size 硬上限,并通过白名单校验 sort_key / sort_direction,避免过大分页或用户输入直接影响排序逻辑。
  • 将 usage 搜索从 LIKE 改为 SQLite FTS5,并覆盖 API Key alias;搜索语义调整为文本 FTS 命中或 API key hash 命中,修复别名搜索会被 hash 条件误过滤的问题。
  • 修复相对时间范围刷新时 end_ms 被初次 render 固定的问题,避免总 tokens 和预估花费在第一次刷新后不再更新。

验证

  • go test ./...
  • go test ./internal/store ./internal/httpapi
  • npm test -- --run src/features/monitoring/hooks/useUsageData.test.ts src/services/api/usageService.test.ts src/features/monitoring/hooks/useMonitoringData.test.ts src/features/monitoring/accountOverviewState.test.ts
  • npm test -- --run src/features/monitoring/hooks/useMonitoringData.test.ts
  • npm run type-check
  • npm run lint
  • npm run build
  • 通过浏览器 CDP 9223 验证监控页刷新只请求 summary/accounts/api-keys/realtime/models,不再请求完整 /v0/management/usage
  • 通过浏览器 CDP 9223 验证搜索 KongWenpeng 后页面仍显示匹配调用、调用方密钥表和预估花费,不再清零。
  • 部署到 off 后验证非法 page_size=501 和非法 sort_key 返回 400,合法 page_size=500&sort_key=lastSeenAt 返回 200。

English

Changes

  • Split the monitoring page's high-frequency usage polling from the full /v0/management/usage payload into smaller paged/aggregate endpoints: summary, accounts, api-keys, realtime, and models.
  • /v0/management/usage/summary now returns only global aggregate data, including total requests, success/failure counts, tokens, and latency aggregates, without detailed breakdowns.
  • Added a paged model breakdown endpoint and compute estimated cost from the loaded model aggregates instead of relying on summary details.
  • Kept legacy backend compatibility: when the paged endpoints are unavailable, the UI still falls back to the old /v0/management/usage payload shape.
  • Manual refresh and the 5-second auto refresh now refresh all new usage endpoints and no longer poll the full /v0/management/usage payload.
  • Added a hard backend cap for page_size and backend whitelist validation for sort_key / sort_direction, preventing oversized pages or user input from directly affecting sorting logic.
  • Replaced usage search LIKE matching with SQLite FTS5 and included API Key aliases; search now matches either FTS text or API key hash, fixing alias searches that were incorrectly filtered out by the hash condition.
  • Fixed stale relative time ranges where end_ms was frozen after the first render, which could keep total tokens and estimated cost unchanged after the first refresh.

Verification

  • go test ./...
  • go test ./internal/store ./internal/httpapi
  • npm test -- --run src/features/monitoring/hooks/useUsageData.test.ts src/services/api/usageService.test.ts src/features/monitoring/hooks/useMonitoringData.test.ts src/features/monitoring/accountOverviewState.test.ts
  • npm test -- --run src/features/monitoring/hooks/useMonitoringData.test.ts
  • npm run type-check
  • npm run lint
  • npm run build
  • Verified through browser CDP on port 9223 that monitoring refresh only requests summary/accounts/api-keys/realtime/models and no longer calls the full /v0/management/usage endpoint.
  • Verified through browser CDP on port 9223 that searching for KongWenpeng still shows matching calls, the caller API key table, and estimated cost instead of clearing the page to zero.
  • After deploying to off, verified that invalid page_size=501 and invalid sort_key return 400, while valid page_size=500&sort_key=lastSeenAt returns 200.

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 3ff8db6 to 07aceca Compare May 24, 2026 22:18
@zly2006 zly2006 changed the title Reduce usage polling payloads with conditional requests Reduce usage polling payloads with conditional & aggregate requests May 24, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch 4 times, most recently from 0ae6c22 to d3ddf09 Compare May 24, 2026 22:49
@zly2006 zly2006 marked this pull request as draft May 24, 2026 22:52
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch 4 times, most recently from 1d3000d to 1e2f1ec Compare May 24, 2026 23:41
@zly2006 zly2006 changed the title Reduce usage polling payloads with conditional & aggregate requests Refactor usage polling to summary API with legacy fallback May 24, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 1e2f1ec to 522d7f1 Compare May 25, 2026 00:04
@zly2006 zly2006 changed the title Refactor usage polling to summary API with legacy fallback Refactor usage polling to paged APIs with legacy fallback May 25, 2026
@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from 522d7f1 to 9417d6d Compare May 25, 2026 00:10
@zly2006 zly2006 marked this pull request as ready for review May 25, 2026 00:13
@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 25, 2026

@seakee could u please review?

@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

@seakee could u please review?

Thanks for the PR.

The current PR does help alleviate the problems of slow loading, timeouts, and high frontend aggregation pressure on the request monitoring page under large data volumes. The subsequent refresh logic fix is also valuable, as it prevents the end_ms from being fixed under a relative time range, which would otherwise cause the token and cost statistics not to update after a refresh.

However, this PR affects the core data path of CPA-Manager request monitoring, so I’m not recommending merging it directly just yet. A few things still need to be addressed:

  1. Although the paginated endpoint has already been split out, the /summary endpoint may still return a large number of details grouped by dimensions such as account, provider, model, api key hash, source, etc. When there are many combinations of accounts, models, and API keys, the summary itself could still become a large payload. I’d suggest making the summary return only top-level statistics and offloading all detailed/breakdown data to the paginated endpoint as much as possible, or at least adding a limit / lazy-loading parameter.

  2. Please confirm that there is a hard upper limit for page_size on the backend to prevent excessively large values from causing internal DoS or putting too much pressure on SQLite. In addition, sort_key / sort_direction must be mapped through a backend whitelist and must not be directly concatenated from user-supplied values.

  3. Currently, the search performs a LIKE "%term%" query across multiple fields, which could be quite slow under large data volumes. I’d suggest documenting the performance boundaries for a large time range combined with search, restricting the time range when necessary, or considering indexes / FTS optimization later.

  4. GitHub is flagging hidden / bidirectional Unicode characters that need to be handled. Hidden Unicode characters should not remain in core code. Please clean them up or explain their specific source and impact.

  5. At this point, the change is no longer just a frontend refactoring; it introduces a set of stable management APIs. Tests need to be added for summary, pagination, filtering, sorting, page size limits, legacy fallback, etc., to avoid discrepancies in request monitoring statistics down the line.

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

thanks for your review, I will fix them

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from d741d30 to a34cfbb Compare May 26, 2026 11:27
@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

I have implemented the suggestions above. I tested manually and it works very well, but I have not manually looked through the code, I will review them later.

I have something to check with you, after using FTS 5, users cannot search with partial queries, e.g. if I want to search codex usages I cannot get it with "co" "code" etc, I have to input full query. Is this good enough for user experience? @seakee

@zly2006 zly2006 force-pushed the optimize-usage-conditional-requests branch from a34cfbb to 2f5b26c Compare May 26, 2026 11:31
@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

First of all, thank you again for this PR and for the follow-up updates.

I agree with the overall direction, and I can see that you have already addressed a number of earlier review points. Splitting the monitoring page from high-frequency polling of the full /v0/management/usage payload into smaller aggregate/paged endpoints such as summary / accounts / api-keys / realtime / models is the right direction for improving CPA-Manager’s request monitoring.

One extra note: if this PR is eventually merged into CPA-Manager, these usage aggregation/pagination APIs will very likely be referenced or adapted in CPA-Manager-Plus as well. CPA-Manager-Plus is also improving request monitoring, data summaries, caller API key statistics, and account-level statistics, so having this API design stabilized here would be directly useful there too.

You are also very welcome to continue contributing PRs to CPA-Manager-Plus. Once the usage query protocol, pagination shape, FTS search behavior, and model cost aggregation are polished here, reusing them in Plus should be quite natural.

That said, because this PR touches the core data path of request monitoring, I still think a few issues should be addressed before merge:

  1. loadModelPages should always start from page 1

Currently the first request uses usagePageQueries.models as-is. If the caller ever passes models.page = 3, it would fetch page 3 first, then page 2..N, skipping page 1 and potentially double-counting page 3.

Even if the current caller always passes page 1, the helper contract is still unsafe. Please either force the first request to use page: 1, or remove page from the model aggregate query type and make it explicit that this path performs a full model aggregate traversal.

  1. mergeUsagePayloads is incomplete

It currently merges only total_requests / success_count / failure_count / total_tokens, but not tokens.*, latency_sum_ms, or latency_count.

Also, when merging apis.models, it appears to assign by model key. If different pages contain the same endpoint + model but different resolved model / failed dimensions, the later page may overwrite previous details. Please append details instead of overwriting, and recompute tokens / latency / totals consistently.

  1. Summary fallback should only happen on 404/405

Right now, if the summary request fails due to a network error, timeout, CORS issue, or connection failure, the status can be undefined and the code may still fall back to the legacy /v0/management/usage endpoint.

This can hide the real error and make refreshes slower. Please only fallback on 404/405, and rethrow all other errors.

  1. FTS5 search should support prefix search

After replacing LIKE "%term%" with regular FTS5 token matching, users may need to type the full token. For example, searching for co or code may not match codex, which is a UX regression for the monitoring page.

I do not suggest reverting to LIKE, but I think FTS5 should support prefix search, for example:

  • co -> co*
  • code -> code*
  • multi-term input can be converted into multiple prefix tokens using the current search semantics

The FTS table could also use a prefix index such as prefix='2 3 4'. I do not think arbitrary substring search like dex -> codex is required; prefix search should be enough.

  1. Please clean up the hidden / bidirectional Unicode warning

GitHub still shows a hidden/bidirectional Unicode warning. Unless there is a specific reason to keep those characters, please remove them before merge to avoid confusion in future reviews and maintenance.

  1. accounts / api-keys pagination is still not SQL-level pagination

The frontend payload is now paginated, which is a good improvement. However, the backend still appears to call usageSummary(..., includeDetails=true), then flatten, group, sort, and slice in Go memory for accounts and api-keys.

This means large time ranges can still create backend memory and aggregation pressure. If this PR is intended as a phase-one optimization, that is acceptable, but please document the current boundary in the PR description and ideally add a larger dataset test.

Longer term, accounts and api-keys should also be pushed down into SQL with GROUP BY + ORDER BY + LIMIT/OFFSET.

  1. FTS trigger strategy needs clarification

usage_events_fts appears to mainly rely on an insert trigger. If usage_events is append-only, that is acceptable for now. But if retention, pruning, delete, or update logic is added later, update/delete triggers will be needed; otherwise the FTS table may contain stale rows.

Please either add those triggers now, or at least add a clear comment that usage_events is currently append-only and that FTS triggers must be extended before introducing cleanup/mutation logic.

Overall, I am positive on this PR. The direction is correct, it has real value, and it aligns well with the future monitoring improvements in both CPA-Manager and CPA-Manager-Plus.

However, because this may become a foundation for later request monitoring work and possible Plus-side reuse, I would prefer to polish the issues above before merging.

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

tysm! I will search for some docs about prefix searching and try to improve it

@zly2006
Copy link
Copy Markdown
Author

zly2006 commented May 26, 2026

oh sorry I didn't notice that this repo was maintenance only. Do you want me to create a new PR in CPA-Manager-Plus?

@seakee
Copy link
Copy Markdown
Owner

seakee commented May 26, 2026

oh sorry I didn't notice that this repo was maintenance only. Do you want me to create a new PR in CPA-Manager-Plus?

No worries, and thanks again for working on this.

My plan is not to reject this PR just because CPA-Manager is mostly maintenance-only. In my view, this kind of monitoring performance improvement still falls within maintenance work, and it was already part of my original plan. This PR actually helps fill an important gap in that plan, so I still think it has value for CPA-Manager.

That said, CPA-Manager-Plus is where I plan to continue more active development. It already has some optimizations around request monitoring and usage aggregation, but the implementation is still not thorough enough, and further performance work is also on the roadmap there.

So my suggestion is:

  • Please continue polishing this PR for CPA-Manager if you are willing. I think it is still useful and relevant here.
  • You are also very welcome to create a new PR in CPA-Manager-Plus, especially if you want to continue improving the monitoring performance work there.
  • The design and fixes from this PR can be a good reference for the Plus-side implementation, but Plus may need some adjustments because its frontend/backend structure is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants